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(57) ABSTRACT 

A method of maintaining cache-coherency in a multi- 
processor computer system provides new states to indicate 
that a sector in an upstream cache has been modified, 
without executing unnecessary bus transactions for the 
lower-level cache(s). These new "U" states can indicate 
which sector in the cache line was modified, or if the cache 
line was the subject of a cachable write-through operation. 
The protocol is implemented as an improvement to the 
prior-art "MESI" cache -coherency protocol. The new pro- 
tocol is especially useful in handling allocate -and-zero 
instructions wherein data is modified in the cache (zeroed 
out) without first fetching the old data from memory. In the 
embodiment wherein there are only two sectors in a given 
cache line, three new states are provided to indicate which 
sector was modified, or whether any cachable write-through 
operation was performed on the cache fine of the first-level 
cache. 

17 Claims, 2 Drawing Sheets 
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CACHE-COHERENCY PROTOCOL WITH 64 kilobytes of total storage. Cache 30 is connected to bus 

UPSTREAM UNDEFINED STATE 20, and all loading of information from memory 16 into 

processor core 22 must come through cache 30. Although 

BACKGROUND OF THE INVENTION FIG. 1 depicts only a two-level cache hierarchy, multi-level 

1 v 1 j f *u t *• s cache hierarchies can be provided where there are many 

L Field of the Invention levels (L3, L4, etc.) of seriaUy connected caches. 

The present invention generally relates to computer {n an SMp ter> it ^ im tant to idc a coherent 

systems, and more particularly to a cache-coherency proto- memory system> ^ ^ to cause writes t0 each individlial 

col which determines whether a snoop operation is required mem ory location to be serialized in some order for all 

to be forwarded upstream to a higher-level cache. processors. For example, assume a location in memory is 

2. Description of the Related Art modified by a sequence of write operations to take on the 

The basic structure of a conventional multi-processor values: 1, 2, 3, 4. In a cache-coherent system, all processors 

computer system 10 is shown in FIG. 1. Computer system 10 wil1 observe the writes to a given location to take place in the 

has several processing units, two of which, 12a and 12b, are order shown. However, it is possible for a processing 

depicted, which are connected to various peripheral devices, „ element to miss a write to the memory location. A given 

including input/output 0/O) devices 14 (such as a display Processing element reading the memory location could see 

monitor, keyboard, graphical pointer (mouse), and a perma- the sequence 13, 4, mussing the update to toe value 2. A 

t . J A . 5. s,. ,tv v . " ,jr, . system that implements these properties is said to be coher- 

nent storage device (hard disk)), memory device 16 (such as n n u K 1 * 1 ♦ ,l 

, * v n 7 ;' A .u * • 1 * \ ent. Virtually all coherency protocols operate only to the 

random access memory or RAM) that is used by the pro- gramilarity J ^ size of a ^ he block> ^ at is t0 * ay> the 

cessingunitstocarryoutprograminstrucuons,and&mware 20 cone tocol mntIo]s the movement of and write 

18 whose primary purpose is to seek out and load an permissions for data on a cachc block basis md not sepa . 

operating system from one of the peripherals (usually the rately for each ^1^^ memory location, 

permanent memory device) whenever the computer is first There are a number of protocols and techniques for 

turned on. Processing units 12a and Mb communicate with achieving cache coherence that are known to those skilled in 

the peripheral devices by various means, including a gen- 25 lhc art At the heart of all these mechanisms for maintaining 

eralized interconnect or bus 20, or direct memory- access coherency is the requirement that the protocols allow only 

channels (not shown). Computer system 10 may have many 0 ne processor to have a "permission" that allows a write to 

additional components which are not shown, such as serial a given memory location (cache block) at any given point in 

and parallel ports for connection to, e.g., modems or print- time. As a consequence of this requirement, whenever a 

ers. Those skilled in the art will further appreciate that there 30 processing element attempts to write to a memory location, 

are other components that might be used in conjunction with it must first inform all other processing elements of its desire 

those shown in the block diagram of FIG. 1; for example, a to write the location and receive permission from all other 

display adapter might be used to control a video display processing elements to carry out the write. The key issue is 

monitor, a memory controller can be used to access memory that all other processors in the system must be informed of 

16, etc. The computer can also have more than two process- 35 the write by the initiating processor before the write occurs, 

ing units. Furthermore, if a block is present in the LI cache of a given 

In a symmetric multi-processor (SMP) computer, all of processing unit, it is also present in the L2 and L3 caches of 

the processing units are generally identical; that is, they all that processing unit. This property is known as inclusion and 

use a common set or subset of instructions and protocols to is well-known to those skilled in the art. Henceforth, it is 

operate, and generally have the same architecture. A typical 40 assumed that the principle of inclusion applies to the caches 

architecture is shown in FIG. 1. A processing unit includes related to the present invention. 

a processor core 22 having a plurality of registers and To implement cache coherency in a system, the processors 

execution units, which carry out program instructions in communicate over a common generalized interconnect (i.e., 

order to operate the computer. An exemplary processing unit bus 20). The processors pass messages over the interconnect 

includes the PowerPC™ processor marketed by Interna- 45 indicating their desire to read or write memory locations, 

tional Business Machines Corporation. The processing unit When an operation is placed on the interconnect, all of the 

can also have one or more caches, such as an instruction other processors "snoop" (monitor) this operation and 

cache 24 and a data cache 26, which are implemented using decide if the state of their caches can allow the requested 

high speed memory devices. Caches are commonly used to operation to proceed and, if so, under what conditions. There 

temporarily store values that might be repeatedly accessed 50 are several bus transactions that require snooping and 

by a processor, in order to speed up processing by avoiding follow-up action to honor the bus transactions and maintain 

the longer step of loading the values from memory 16. These memory coherency. The snooping operation is triggered by 

caches are referred to as "on-board" when they are integrally the receipt of a qualified snoop request, generated by the 

packaged with the processor core on a single integrated chip assertion of certain bus signals. 

28. Each cache is associated with a cache controller (not 55 This communication is necessary because, in systems 

shown) that manages the transfer of data between the with caches, the most recent valid copy of a given block of 

processor core and the cache memory. memory may have moved from the system memory 16 to 

A processing unit can include additional caches, such as one or more of the caches in the system (as mentioned 

cache 30, which is referred to as a level 2 (L2) cache since above). If a processor (say 12a) attempts to access a memory 

it supports the on-board (level 1) caches 24 and 26. In other 60 location not present within its cache hierarchy, the correct 

words, cache 30 acts as an intermediary between memory 16 version of the block, which contains the actual (current) 

and the on-board caches, and can store a much larger amount value for the memory location, may either be in the system 

of information (instructions and data) than the on-board memory 16 or in one of more of the caches in another 

caches can, but at a longer access penalty. For example, processing unit, e.g. processing unit 12b. If the correct 

cache 30 may be a chip having a storage capacity of 256 or 65 version is in one or more of the other caches in the system, 

512 kilobytes, while the processor may be an IBM Pow- it is necessary to obtain the correct value from the cache(s) 

erPC™ 604-series processor having on-board caches with in the system instead of system memory. 
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For example, consider a processor, say 12a, attempting to unit can determine whether any other processor in the 

read a location in memory. It first polls its own LI cache (24 system has a copy of the block. If no other processing unit 

or 26). If the block is not present in the LI cache, the request has an active copy of the block, the reading processing unit 

is forwarded to the L2 cache (30). If the block is not present marks the state of the block as "exclusive." If a block is 

in the L2 cache, the request is forwarded on to lower cache 5 marked exclusive, it is permissible to allow the processing 

le vels if present, e.g., the L3 cache. If the block is not present unit to later write the block without first communicating 

in the lower-level caches, the request is then presented on the with other processing units in the system because no other 

generalized interconnect (20) to be serviced. Once an opera- processing unit has a copy of the block. Therefore, in 

tion has been placed on the generalized interconnect, all general, it is possible for a processor to read or write a 

other processing units snoop the operation and determine if 10 location without first communicating this intention onto the 

the block is present in their caches. If a given processing unit interconnection. However, this only occurs in cases where 

has the block of data requested by processing unit in its LI the coherency protocol has ensured that no other processor 

cache and that data is modified, by the principle of inclusion, has an interest in the block. 

the L2 cache and any lower-level caches also have copies of The foregoing cache-coherency technique is implemented 

the block (however, their copies may be stale, since the copy 15 in a specific protocol referred to as "MESI," and is illus- 

in the processor's cache is modified). Therefore, when the trated in FIG. 2. In this protocol, a cache block can be in one 

lowest-level cache (e.g., L3) of the processing unit snoops of four states, "M" (Modified), "E" (Exclusive), "S" 

the read operation, it will determine that the block requested (Shared) or "I" (Invalid). Under the MESI protocol, each 

is present and modified in a higher-level cache. When this cache entry (e.g., a 32-byte sector) has two bits which 

occurs, the L3 cache places a message on the generalized 20 indicate the state of the entry, out of the four possible states, 

interconnect informing the processing unit that it must Depending upon the initial state of the entry and the type of 

"retry" its operation again at a later time, because the actual access sought by the requesting processor, the state may be 

value of the memory location is in the LI cache at the top changed, and a particular state is set for the entry in the 

of the memory hierarchy and must be retrieved to make it requesting processor's cache. For example, when a sector is 

available to service the read request of the initiating pro- 25 in the Modified state, the addressed sector is valid only in the 

cessing unit. cache having the modified sector, and the modified data has 

Once the request from processing unit has been retried, not been written back to system memory. When a sector is 

the L3 cache begins a process to retrieve the modified data Exclusive, it is present only in the noted sector, and is 

from the LI cache and make it available at the L3 cache, consistent with system memory. If a sector is Shared, it is 

main memory or both, depending on the exact details of the 30 valid in that cache and in at least one other cache, all of the 

implementation which are not specifically relevant to this shared sectors being consistent with system memory, 

invention. To retrieve the block from the higher-level Finally, when a sector is invalid, it indicates that the 

caches, the L3 cache sends messages through the inter-cache addressed sector is not resident in the cache. As seen in FIG. 

connections to the higher- level caches, requesting that the 2, if a sector is in any of the Modified, Shared or Invalid 

block be retrieved. These messages propagate up the pro- 35 states, it can move between the states depending upon the 

cessing unit hierarchy until they reach the LI cache and particular bus transaction. While a sector in an Exclusive 

cause the block to be moved down the hierarchy to the state can move to any other state, a sector can only become 

lowest-level (the L3 cache or main memory) to be able to Exclusive if it is first Invalid. 

service the request from the initiating processing unit. One of the difficulties of maintaining SMP performance as 

The initiating processing unit eventually represents the 40 processor speeds improve is the increased load on the 

read request on the generalized interconnect. At this point, system memory bus. One way to lessen that impact is to 

however, the modified data has been retrieved from the LI increase the width of the bus and the amount of data 

cache of a processing unit and the read request from the transferred with each transaction (the "transfer burst size"), 

initiating processor will be satisfied. The scenario just Unfortunately, this transfer size becomes the cache line size 

described is commonly referred to as a "snoop push." A re ad 45 and coherency size for the system and impacts the software 

request is snooped on the generalized interconnect which model if it has cache -controlling instructions, as most 

causes the processing unit to "push" the block to the bottom reduced instruction set computing (RISC) processors do. In 

of the hierarchy to satisfy the read request made by the order to prevent impacting the software, a sectored cache is 

initiating processing unit. implemented between the processor and the system bus. The 

The key point to note is that when a processor wishes to 50 sectored cache has a line size equal to the memory and 

read or write a block, it must communicate that desire with system transfer size, with a sector size equal to the processor 

the other processing units in the system in order to maintain cache line size. This construction solves the software impact 

cache coherence. To achieve this, the cache coherence problem, but raises several design issues for the lower-level 

protocol associates with each block in each level of the cache which is trying to maintain inclusivity and coherency, 

cache hierarchy, status indicators indicating the current 55 First, whenever the higher-level cache (LI) executes a 

"state" of the block. The state information is used to allow particular instruction referred to as an allocate -and-zero 

certain optimizations in the coherency protocol that reduce instruction ("DCBZ" in the PowerPC™ instruction set), it is 

message trafl&c on the generalized interconnect and the modifying data in its cache (zeroing it) without first fetching 

inter-cache connections. As one example of this mechanism, the old data from memory. This operation is commonly 

when a processing unit executes a read, it receives a message 60 performed when reallocating memory areas to a new pro - 

indicating whether or not the read must be retried later. If the cess. The lower-level cache also needs to allocate and zero 

read operation is not retried, the message usually also its cache line, but it has a larger cache line. The conventional 

includes information allowing the processing unit to deter- method of implementing this procedure is to read the larger 

mine if any other processing unit also has a still active copy line from memory and then zero out the portion correspond- 

of the block (this is accomplished by having the other 65 ing to the processor cache line. This approach, however, 

lowest-level caches give a "shared" or "not shared" indica- defeats the entire purpose of the operation which is to avoid 

tion for any read they do not retry). Therefore, a processing reading data from memory that is going to be reallocated 
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anyway. Furthermore, it is likely that the processor will, in With this new protocol, sectors that are valid in higher levels 

a very short time span, allocate-and-zero additional cache may be properly tracked without executing unnecessary bus 

lines which would fall into the remaining portion of the operations, and cachable write-through operations are more 

larger cache line in the lower-level cache (although the efficiently supported. 

lower-level cache cannot assume this is the case). So the first 5 The above as well as additional objectives, features, and 

problem is keeping track of sectors that are valid in the advantages of the present invention will become apparent in 

higher-level (e.g., LI) cache but are not yet valid in lower- the following detailed written description, 
level caches (e.g., L2 or L3). 

A second problem is how to efficiently support cachable BRIEF DESCRIPTION OF THE DRAWINGS 

write-through operations, which are commonly used by, e.g., 10 The novel features believed characteristic of the invention 

graphic device drivers (such as for a video display monitor). are set forth in the appended claims. The invention itself, 

A large amount of data is often referenced in this case, but however, as well as a preferred mode of use, further 

only a small amount is modified. The modified data is objectives, and advantages thereof, will best be understood 

required to be visible to the graphics device outside of the by reference to the following detailed description of an 

processor in a timely manner, so the cachable write-through 25 illustrative embodiment when read in conjunction with the 

protocol is used. This protocol requires allocating the line accompanying drawings, wherein: 

containing the modified data in all levels of caches to piG. 1 is a block diagram of a prior-art multi-processor 

maintain inclusion, but complicates the implementation due computer system; 

to the write-through operation which would require partial FIG. 2 is a state diagram depicting a prior art cache- 

line-write capability (an expensive and complicated feature) M coherency protoco i (MESI); and 

to keep the caches consistent or flushing and invaHdating mQ 3 ^ a ^ d{ d ^ ^ 

the line when it is written, which would negatively impac q} Qf ^ t B {nvtn ^ 

performance since the line needs to be fetched again the next 

time it is referenced. DESCRIPTION OF THE PREFERRED 

It would, therefore, be desirable to devise a method of 25 EMBODIMENT 
indicating that a cache line is allocated and valid upstream The present invention is directed to a method of main- 
of a given cache level, while undefined at that level, in order taining cache coherency in a multi-processor system, such as 
to avoid unnecessary bus operations. It would be further ^ system of FIG. 1, but the present invention could be 
advantageous if the method could efficiently handle the rare applied to computer systems that are not necessarily 
cades where a snoop hit occurs against such an upstream conventional, i.e., they could include new hardware corn- 
modified sector. ponents not shown in FIG. 1, or have a novel interconnection 

SUMMARY OF THE INVENTION a 1 rc 1 f J ectur 1 e for e ^ tin S components. Therefore, those 

skilled in the art will appreciate that the present invention is 

It is therefore one object of the present invention to 35 not limited to the generalized system shown in that figure, 

provide an improved method of maintaining cache coher- W | tn reference now to FIG. 3, there is depicted a state 

ency in a multi-processor computer system having sectored diagram of one embodiment of the cache-coherency proto- 

lower-level caches. co \ 0 f tne present invention. This protocol is similar to the 

It is another object of the present invention to provide prior art MESI protocol of FIG. 2, in that it includes the same 

such a method that improves performance of zero allocation 40 four states (Modified, Exclusive, Shared and Invalid), but 

operations on cache lines. also includes three new "U" states, for upstream, undefined 

It is yet another object of the present invention to provide sector, as explained further below; this new protocol is 

such a method that additionally supports write-through referred to herein as the "U-MESP' protocol. As with the 

cache operations without providing a complicated partial- prior art protocol, the four M-E-S-I states may change based 

write capability. 45 on the initial state of the entry and the type of access sought 

The foregoing objects are achieved in a method of main- b y tne requesting processor. The manner in which these four 

taining cache coherency in a multi-processor computer states change is generally identical to the prior art MESI 

system, generally comprising the steps of loading a first protocol, with the exceptions noted below, 

value into a cache line block in a first-level cache of a In the depicted embodiment, the U-MESI protocol is 

processing unit, and into a sector of a cache line in a 50 adapted for a cache having cache lines with two sectors. In 

second-level cache of the processing unit, then modifying this embodiment, there are three "U" states due to the three 

the value in the cache line block in the first-level cache of the possible cases wherein: (1) the first of the two sectors (the 

processing unit, and indicating at the second-level cache that "odd" sector) is modified; (2) the second of the two sectors 

the sector of the cache line in the second-level cache has (the "even" sector) is modified; and (3) neither of the sectors 

been modified upstream. This indication is made without 55 is modified (they are both shared as a result of a cachable 

modifying the sector of the cache line in the second-level write-through operation). The first of these states is referred 

cache. This procedure can be performed in response to an to herein as "U IM " while the second of these states is 

allocate-and-zero (DCBZ) instruction which zeros out the referred to herein as "U ^," and the third state is referred to 

cache line block of the first cache level of the processing herein as "U^" In this implementation of the U-MESI 

unit. The indicating step includes an indication of which 60 protocol, each cache entry now has three bits which indicate 

sector in a plurality of sectors in the cache line in the the state of the entry, out of the seven possible states (the 

second-level cache corresponds to the cache line block in the four prior-art states, and the three new "U" states). If more 

first-level cache that was modified. In the embodiment than two sectors were provided in a cache line, then addi- 

wherein there are only two sectors in a given cache line, tional "U" states would be required (and additional bits in 

three new states are provided to indicate which sector was 65 the cache entry). 

modified, or whether any cachable write- through operation Table 1 shows the cache transitions involving the highest 

was performed on the cache line of the first-level cache. level (LI) operations: 
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TABLE 1 




Highest Level 


Lower Level 




(LI) Operation 


cache transition 


l 


DCBZ~cvcn sector 




2 


DCBZ^odd sector 


I - u MI 


3 


DCBZ--even sector 




4 


DCBZ-odd sector 


U IM - M 


5 


Read/RWITM 


UmMm - I 


6 


Any LI "hit" 


U ss ^ U ss 


7 


Cachable write-through 




8 


Any other operation 


normal MESI 



10 



In the first of the entries in Table 1, when a DCBZ 
operation (which is a write-type operation) is performed on 15 
an even sector (the second sector) in the cache line of the LI 
cache, any corresponding lower-level caches in the "I" 
(Invalid) state will undergo a transition to "U IM " i.e., only 
the second sector is noted as being modified. In the second 
entry in Table 1, when a DCBZ operation is performed on an 2 o 
odd sector (the first sector) in the cache line of the LI cache, 
any corresponding lower-level caches in the "I" (Invalid) 
state will undergo a transition "\) MJ ," i.e., only the first 
sector is noted as being modified. 

If a DCBZ operation is performed on an even sector when 25 
the odd sector of the same line has previously undergone a 
DCBZ operation and the corresponding lower- level caches 
are in the "U MY " state (the third entry in Table 1), or if a 
DCBZ operation is performed on an odd sector when the 
even sector of the same line has previously undergone a 30 
DCBZ operation and the corresponding lower-level caches 
are in the "U 7A /' state (the fourth entry in Table 1), then the 
lower-level caches will undergo a state transition to the "M" 
(Modified) state to indicate that the entire line is modified. 
If, however, only one DCBZ operation has previously 35 
occurred for a given line and the lower-level caches have 
that line at a "U /Af " or "U^," state, and the other (Invalid) 
line is the subject of a "read" or "read with intent to modify" 
(RWITM) operation, then the lower-level cache lines go to 
"F (Invalid), and the modified sector (M -sector) is flushed 40 
from the higher-level cache. 

In the sixth entry of Table 1, if an LI "hit" occurs against 
the subject block, and the lower-level caches are in the "U^" 
state, they will remain in that state, i.e., it is treated as if it 
were invalid, but not cached. If a cachable/write-through 45 
read operation is performed on the block (the seventh entry 
in Table 1), and the lower-level caches have the correspond- 
ing block in an "I" (Invalid) state, then they undergo a 



transition to the "U^" state. Finally, as noted in entry eight 
of Table 1, all other LI operations not specified above 
undergo a normal transition, i.e., according to the prior-art 
MESI protocol. 

Table 2 shows how system bus snooped transactions will 
influence the caches in the "U" states: 



50 



TABLE 2 





Bus operation 


Snooper state 


Coherency 
response 


1 


Any snoop "hit" 




Retry 


2 


Any snoop "hit" 




Retry 


3 


Non-read snoop "hit" 




Retry 


4 


Read snoop "hit" 




Shared 



55 



60 



In the "U" states, the cache knows it must take action but 
must forward the snoop upstream to determine the proper 
action. Table 2 shows only those rare cases where a snoop 



65 



hit occurs against one of the "U" states. In these situations, 
the lower-level cache will flush the contents of the upstream 
cache and move to the "I" (Invalid) state, and issue a "Retry" 
response, except where a read snoop hit occurs against a 
"U55" state, in which the coherency response is "Shared." 

With the foregoing U-MESI protocol, both of the prob- 
lems mentioned above are solved, i.e., keeping track of 
sectors that are valid in higher levels without executing 
unnecessary bus operations, and efficiently supporting cach- 
able write-through operations. The results are increased 
memory bandwidth and the freeing up of address bandwidth, 
as well as byte-write capability. 

Although the invention has been described with reference 
to specific embodiments, this description is not meant to be 
construed in a limiting sense. Various modifications of the 
disclosed embodiment, as well as alternative embodiments 
of the invention, will become apparent to persons skilled in 
the art upon reference to the description of the invention. It 
is therefore contemplated that such modifications can be 
made without departing from the spirit or scope of the 
present invention as defined in the appended claims. 

What is claimed is: 

L A method of maintaining cache coherency in a multi- 
processor computer system having a plurality of processing 
units, each processing unit having a cache hierarchy includ- 
ing at least first- and second-level caches, wherein the first 
cache level is upstream of the second cache level, the 
method comprising the steps of: 

loading a first value into a cache fine block in a first-level 
cache of a processing unit, and into a sector of a cache 
line in a second-level cache of the processing unit, 
wherein the cache line in the second-level cache is 
comprised of a plurality of sectors corresponding to 
separate cache line blocks in the first-level cache; 
modifying the value in the cache line block in the first- 
level cache of the processing unit; and 
indicating at the second-level cache that the sector of the 
cache line in the second level cache has been modified 
at an upstream cache without modifying the sector of 
the cache line in the second-level cache. 

2. The method of claim 1 wherein said modifying step 
zeros out the cache line block of the first-level cache of the 
processing unit. 

3. The method of claim 1 further comprising loading at 
least one other value into another cache line block of the 
first-level cache that corresponds to the sector of the cache 
line in the second-level cache. 

4. The method of claim 1 further comprising the step of 
responding to an inquiry from a second processing unit 
regarding a request to access a memory block corresponding 
to the first value. 

5. The method of claim 1 wherein said indicating step 
includes an indication of which sector in a plurality of 
sectors in the cache line in the second-level cache corre- 
sponds to the cache line block in the first-level cache that 
was modified. 

6. The method of claim 5 comprising the further step of 
indicating that the cache line of the first-level cache is 
invalid in response to an attempt to access the upstream 
modified cache line. 

7. The method of claim 7 wherein said indicating step 
includes an indication of any cachable write-through opera- 
tion performed on the cache line of the first-level cache. 

8. The method of claim 1 wherein a cache line in the 
second-level cache has only two sectors, and said indicating 
step includes an indication of which of the two sectors 
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corresponds to the cache line block in the first-level cache 13. The computer system of claim 11 wherein: 

that was modified. eaca s ^ cache line in said first -level cache has a plurality 

9. The method of claim 8 comprising the further step of 0 f sectors . m d 

indicating that the cache line of the first-level cache is . ' . tJ . , , , , 

, . „ „ each said cache line in said second-level cache has a 

invalid m response to an attempt to access the upstream 5 , , ■ • 7 , 

modified cache line larger number of sectors than each said cache lme in 

10. The method of claim 1 wherein said indicating step said nrst ~ leve l cache. 

includes an indication of any cachable write-through opera- 14 ^ computer system of claim 11 wherein said means 

tion performed on the cache line of the first-level cache. for providing said indication includes means for indicating 

11. A computer system comprising: 10 which sector in said plurality of sectors in said cache line in 

a memory device; sa * d second-level cache corresponds to a block in said 

. j . . , , first- level cache that was modified, 

a bus connected to said memory device; and „ . \ . ' , 

. . . . . , 15. The computer system of claim 11 wherein said means 

a plurality of processing units connected to said bus, each fof idi said iDdication further incUldes means for 

processing unit having at least a first-level cache and a T l li ■ •* 4 i_ i_ 

j i i u ur.j . . ♦ , 15 indicating when any cachable write-through operation is 

second-level cache, each of said caches having a plu- , * ' - ., - A , 7 * 

r . r , . r li* • * j performed on a cache line of said first-level cache. 

rahty or cache lines, and each of said cache lines in said r „ - _ - , 

second-level cache having a plurality of sectors, , 16 ^ 'fomputer system .of claim 11 wherein said means 

wherein the sectors respectively correspond to separate f°r providing said indication further includes means for 

cache line blocks in the first-level cache, said process- on Seating that a cache line of said first-level cache, corre- 

ing units each further having means for providing an spending to said given cache line of said second-level cache, 

indication of when a given sector of a given cache line ^ invalid in response to an attempt to access said cache line 

in said second-level caches has been modified at an °f sa id first -level cache. 

upstream cache without modifying said given sector. 17. The computer system of claim 16 wherein said means 

12. The computer system of claim 11 wherein each said 25 for providing said indication further includes means for 
processing unit further responds to an inquiry from another indicating when any cachable write-through operation is 
processing unit regarding a request to access a memory performed on said cache line of said first-level cache, 
block corresponding to said given sector by forwarding said 

indication. ***** 
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